Technical comment to "Database verification studies of SWISS-PROT and GenBank" by Karp et al
نویسندگان
چکیده
In their paper “Database verification studies of SWISSPROT and GenBank” Karp et al. (2001) conclude: (1) “SWISS-PROT is more incomplete than we expected. . . ”; (2) “Even if we combine SWISS-PROT and TrEMBL, some sequences from the full genomes are missing from the combined dataset”; (3) “In many cases, translated GenBank genes do not exactly match the corresponding SWISS-PROT sequences, . . . ”; and (4) “. . . that SWISS-PROT does not identify a significant number of experimentally characterized proteins”. These results, and the approach used to arrive at these results, are in our opinion somewhat misleading. Herein, we only focus on four major points. First, there has never been a claim that SWISS-PROT is comprehensive. Thus, it is surprising that Karp et al. found that “SWISS-PROT is more incomplete than we expected. . . ”. To make sequences available as quickly as possible without diluting the quality of SWISS-PROT, the supplemental database TrEMBL was introduced in 1996 and contains the translation of all coding sequences (CDS) in the DDBJ/EMBL/GenBank nucleotide sequence database, except those already included in SWISS-PROT. Snapshots of the SWISS-PROT, TrEMBL and TrEMBLnew databases are released weekly, synchronised with the DDBJ/EMBL/GenBank nucleotide sequence database and provide comprehensive coverage (ftp://ftp.ebi.ac.uk/pub/databases/sp tr nrdb/). The weekly comprehensive SWISS-PROT/TrEMBL nonredundant database (SPTR) has been widely publicised on the EBI and ExPASy web-servers and in various publications (e.g. Apweiler, 2000). Second, the authors’ assertions that “Even if we combine SWISS-PROT and TrEMBL, some sequences from the full genomes are missing from the combined dataset.” and “SWISS-PROT curators apparently chose not to replace existing SWISS-PROT sequences with sequences from complete-genome projects” are rather inaccurate. Karp et al. tried to establish corresponding sets of SWISS-PROT/TrEMBL proteins and
منابع مشابه
Database verification studies of SWISS-PROT and GenBank
PROBLEM STATEMENT We have studied the relationships among SWISS-PROT, TrEMBL, and GenBank with two goals. First is to determine whether users can reliably identify those proteins in SWISS-PROT whose functions were determined experimentally, as opposed to proteins whose functions were predicted computationally. If this information was present in reasonable quantities, it would allow researchers ...
متن کاملChallenges in Integrating Biological Data Sources
Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages...
متن کاملBiomedical Named Entity Recognition: A Survey of Machine-Learning Tools
It is well known that the rapid growth and dissemination of the Internet has resulted in huge amounts of information generated and shared, available in the form of textual data, images, videos or sounds. This overwhelming surge of data is also true for specific areas such as biomedicine, where the number of published documents, such as articles, books and technical reports, is increasing expone...
متن کاملThe European Bioinformatics Institute (EBI) databases
This paper describes the databases and services of the European Bioinformatics Institute (EBI). In collaboration with DDBJ and GenBank/NCBI, the EBI maintains and distributes the EMBL Nucleotide Sequence Database, Europe's primary nucleotide sequence data resource. The EBI also maintains and distributes the SWISS-PROT Protein Sequence Database, in collaboration with Amos Bairoch of the Universi...
متن کاملSome Notes on Critical Appraisal of Prevalence Studies; Comment on: “The Development of a Critical Appraisal Tool for Use in Systematic Reviews Addressing Questions of Prevalence”
Decisions in healthcare should be based on information obtained according to the principles of Evidence-Based Medicine (EBM). An increasing number of systematic reviews are published which summarize the results of prevalence studies. Interpretation of the results of these reviews should be accompanied by an appraisal of the methodological quality of the included data and studies. The critical a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 17 شماره
صفحات -
تاریخ انتشار 2001